How to Thematically Segemt Texts by Using Lexical Cohesion?

نویسنده

  • Olivier Ferret
چکیده

This article outlines a quantitative method for segmenting texts into thematically coherent units. This method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. We also present the results of an experiment about locating boundaries between a series of concatened texts. 1 I n t r o d u c t i o n Several quantitative methods exist for thematically segmenting texts. Most of them are based on the following assumption: the thematic coherence of a text segment finds expression at the lexical level. Hearst (1997) and Nomoto and Nitta (1994) detect this coherence through patterns of lexical cooccurrence. Morris and Hirst (1991) and Kozima (1993) find topic boundaries in the texts by using lexical cohesion. The first methods are applied to texts, such as expository texts, whose vocabulary is often very specific. As a concept is always expressed by the same word, word repetitions are thematically significant in these texts. The use of lexical cohesion allows to bypass the problem set by texts, such as narratives, in which a concept is often expressed by different means. However, this second approach requires knowledge about the cohesion between words. Morris and Hirst (1991) extract this knowledge from a thesaurus. Kozima (1993) exploits a lexical network built from a machine readable dictionary (MRD). This article presents a method for thematically segmenting texts by using knowledge about lexical cohesion that has been automatically built. This knowledge takes the form of a network of lexical collocations. We claim that this network is as suitable as a thesaurus or a MRD for segmenting texts. Moreover, building it for a specific domain or for another language is quick.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Cohesion in English and Persian Abstracts

This study compares and contrasts lexical cohesion in English and Persian abstracts of Iranian medical students’ theses to appreciate textualization processes in the two languages. For this purpose, one hundred English and Persian abstracts were selected randomly and analyzed based on Seddigh and Yarmohamadi’s (1996) lexical cohesion framework, a version of Halliday and Hasan’s (1976) and Halli...

متن کامل

Disunity in Cohesion: How Purpose Affects Methods and Results When AnalyzingLexical Cohesion

Lexical Cohesion is a commonly studied linguistic feature as it is easily identified from the surface of a text. However, the purposes for studying lexical cohesion are varied, and each purpose requires different methods. This study analyzes two short movie review texts for four different research purposes using lexical cohesion: text evaluation, text segmentation, text summarization, and text ...

متن کامل

Lexical Cohesion and Literariness in Malcolm X's " The Ballot or the Bullet"

This paper unearths the contribution of lexical cohesion to the textuality and overall meaning of Malcolm X’s speech 'The Ballot or the Bullet'. Drawing on Halliday and Hasan’s (1976) and Hoey’s (1991) theory of cohesion, specifically lexical   cohesion, whose main thrust is the role of lexical items in not only contributing to meaning but also serving as cohesive ties, the paper discusses how ...

متن کامل

WordNet for Lexical Cohesion Analysis

This paper describes an approach to the analysis of lexical cohesion using WordNet. The approach automatically annotates texts with potential cohesive ties, and supports various thesaurus based and text based search facilities as well as different views on the annotated texts. The purpose is to be able to investigate large amounts of text in order to get a clearer idea to what extent semantic r...

متن کامل

A Thematic Segmentation Procedure for Extracting Semantic Domains from Texts

Thematic analysis is essential for a lot of Natural Language Processing (NLP) applications, such as text summarization or information extraction. It is a two-dimensional process which has both to identify the thematic segments of a text and to recognize the semantic domain concerned by each of them. This second task requires having a representation of these domains. Such representations are bui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998